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Abstract 

We analyse base-pair breathing in a DNA sequence of 12 base-pairs with a defective 
base at its centre. We use both all-atom molecular dynamics (MD) simulations and 
a system of stochastic differential equations (SDE). In both cases, Fourier analysis of 
the trajectories reveals self-organised critical behaviour in the breathing of base-pairs. 
The Fourier Transforms (FT) of the interbase distances show power-law behaviour with 
gradients close to —1. The scale-invariant behaviour we have found provides evidence 
for the view that base-pair breathing corresponds to the nucleation stage of large-scale 
DNA opening (or 'melting') and that this process is a (second-order) phase transition. 
Although the random forces in our SDE system were introduced as white noise, FTs 
of the displacements exhibit pink noise, as do the displacements in the AMBER/MD 
simulations. 
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1 Introduction 



Much has been written about the process by which double-stranded DNA becomes two sepa- 
rated single strands, see, for example, (jH; |i|). Several authors refer to the melting of DNA as 
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having the form of a phase transition, in which the opening of one or a few base-pairs is akin 
to nucleation, and the subsequent growth of open 'bubbles' is similar to the growth of crystal 
nucleii. This two-stage process is relatively well understood in the contexts of crystal growth, 
and aerosol formation, but less so in kinetics of DNA replication. The aim of this paper is to 
show that the nucleation event - that is the initial opening of bases - exhibits self-similar, or 
scale-free, critical behaviour, as one would expect at a phase transition. Whilst other studies 
have analysed bubble growth and the statistics of bubble length, we choose to focus on the 
temporal statistics. Our model is more detailed, only focusing on 12 base-pairs and the open- 
ing of the first base; our model is thus significantly smaller than the bubble growth models of 
(El; 0); however, our models are more accurate in that the AMBER simulations (3) (Assisted 
Model Building with Energy Refinement) include the effect of every atom in the DNA and 
the water molecules in the environment, and our SDE models include an accurate fitting of 
the nonlinear inter-base potential energy as described in our earlier work (0). 

According to Watson Sz Crick (7), the structure of a DNA duplex consists of two chains 
of bases. These bases are of four types: the purines Adenine (A) and Guanine (G) and the 
pyrimidines Cytosine (C) and Thymine (T). Along the chains, the bases are linked by covalent 
bonds, while the opposite bases from the two chains pair together by two or three hydrogen 
bonds forming base-pairs. Only A-T or C-G pairs are possible. Given this information, we 
use a lattice representation for our DNA sequence, as illustrated in Figure HJ Breathing - the 
localised separation of complementary bases - takes place on the microsecond timescale in 
normal DNA, which is beyond the range of all-atom molecular dynamics (MD) packages. The 
insertion of a defect, that is, replacing a thymine (T) with a difluorotoluene (F) base at the 
lattice site n = 0, increases the frequency of breathing due to the weakening of the inter-chain 
potential. This makes breathing occur on the nanosecond timescale and hence it becomes 
accessible to MD simulation techniques. Biologically the reason for considering the inclusion 
of a defect is to study fidelity, and the effects of errors, in DNA replication. The defect F has 
been considered previously, for example, by Cubero et al. (8), who considered such a system 
without any externally imposed twist. It is possible that proteins may locally alter the twist 
of a DNA helix in order to ease the process of localised melting. Hence, here, we impose a 
twist on the DNA structure in order to investigate its effect on base-pair breathing. 

u-%{b\s*^ *\ uo(t)/ \u 2 (t) 




Figure 1: Illlustration of the mesoscopic model of DNA. 



In this paper, we analyse base-pair breathing at a defect in double-stranded DNA parame- 
terised using the molecular dynamics (MD) simulation tool AMBER. We consider a range 
of helicoidal twist angles from 30° to 40° per base-pair at rest, including the typical twist 
of 36°. In (jl) we derived a coarse-grained model based on stochastic differential equations 
with one variable for the displacement of each base from its equilibrium position. This model 
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includes explicit random forcing terms, which model the interactions of water molecules with 
the bases, and the whole model has been parameterised to the all-atom MD simulations de- 
rived using AMBER. Developed from the models of (JBI; 0), the incorporation of noise and 
damping terms enables us to investigate the temporal dynamics of breathing. The model is 
fitted to AMBER data using a sophisticated maximum likelihood estimation procedure which 
is described in (jH) and summarised in Section I2T21 here. The associated fluctuation-dissipation 
relation, the dependence of parameters on angle of twist and form of the inter-base potential 
are also discussed in 



The DNA sequence under study consists of 12 base-pairs and we have taken that the DNA 
sequence is surrounded by a water box. The analysis of the simulations of a DNA molecule 
obtained using AMBER and our SDE system presented in (0) revealed that the amplitude of 
fluctuations is slightly reduced in the SDE model, but the breathing length and frequency are 
similar. In addition, the derivation of the SDE model requires us to modify the fluctuation- 
dissipation relation in the reduced, or coarse-grained, mesoscopic models. We also make a 
distinction between the potential of mean force (the free energy) and the various potential 
energies in our system. This is supported by an analysis of the importance of the damping 
term in preserving the system energy (which, in combination with the forcing noise terms 
gives rise to the entropic component of the free energy) and the way in which the along- 
chain interactions influence the length of a breathing event. We have also shown in (jij) that 
breathing events are due not only to inhomogeneities in the inter-strand interactions, but 
also to a significant change in along-chain interactions and the helical twist of the DNA, 
which potentially influences the interactions between the DNA molecule and the surrounding 
solvent. 

A more detailed analysis of our simulations presented below reveals an interesting result: 
rather than a well-defined breathing frequency that depends on the twist angle via the energy 
barrier between open and closed states of the central basepair, we find that at all angles of twist 
the DNA exhibits breathing across a wide range of frequencies and the amplitude-frequency 
relationship exhibits scale-free behaviour. Most previous results in the literature show that 
the energy is transferred between nonlinear localized modes with particular frequencies and 



between such modes and phonons, as discussed by Peyrard et al. (|9|; lid; llll) or as shown by 



the results of Gaeta et al. (1121 ; Il3t for example. 



In the rest of this section we review some of the relevant theory and applications involv- 
ing self-organised systems. In Section [2] we outline the models we use and the simulation 
techniques. Section [3] contains the results of both approaches, showing the consistency of 
outcomes. Section [4] concludes the paper with a summary and discussion. 



1.1 Self-organisation and criticality 



Many dynamical systems evolve to a steady state (or an equilibrium) solution or to a limit 
cycle. More complicated large-time behaviour includes spatio-temporal chaos which is often 
characterised by strange attractors in phase space (Il5l ). Through the study of cellular au- 
tomata, Wolfram (Il6l ) characterised large time behaviour into four classes. He empirically 
identified the following qualitative classes: spatially homogeneous systems (akin to steady- 
states), periodic structures (limit cycles), systems with chaotic aperiodic behaviour (chaos) 
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and, most notably, a fourth, that of complicated localised and possibly propagating structures. 
We claim that this last category is the most appropriate classification for the behaviour which 
we observe later in this paper, and, more precisely, this behaviour corresponds to self-organised 
criticality (SOC). 

In physics, a critical point specifies the conditions, such as temperature, pressure or composi- 
tion, at which a phase boundary is not valid anymore. Here, by 'phase' we understand a state 
of a system for which the physical properties of a component are uniform. As one approaches 
the critical point, the properties of the different phases approach each other. In other words, 
a critical point refers to a system configuration to which the system evolves without ever 
approaching a fixed equilibrium state. Mathematically, one can define characteristic sizes 
of behaviour (lengths or timescales of events) which describe the system. When these sizes 
become infinite the system is classed as critical, that is, fluctuations occur at all length scales. 

Systems having the SOC property present a spatially or temporally scale-invariant behaviour 
without the need to tune any parameter to a specific value. This ubiquity of scale-free 



behaviour ( )18l ) in such models shows that complex behaviour can be stable. This contrasts 
with systems where one would generically expect some stable steady behaviour for a wide 
range of parameter values, and as a parameter changes, one might observe either a smooth 
change in the system's behaviour or a bifurcation to a qualitatively different steady-state or 
limit cycle, for example. In the special case when the parameter takes on the threshold value 
between two states, more complex phenomena may be observed. This is a typical form of 
behaviour around a phase transition. In general, the total number of states is finite and the 



transitions can be characterised using a cellular automaton structure (I19l ). 



When analysing a large system, we aim to reduce its complexity to a few degrees of freedom, for 
which the coupling can be defined in a general manner and hence we obtain some averaged, 
or coarse-grained, behaviour over ignored quantities and includes corresponding averaged 
interactions within the system and the surrounding environment. For dynamical systems, 



such a dimensional reduction can be achieved by the "slaving principle" (llTT ) which leads 
to the the study of low- dimensional attractors. This is often a straightforward method. For 
example, "fast modes" at equilibrium can be slaved to a few slowly-evolving modes. However, 
sometimes a system responds on both fast and slow timescales, even at large times, and we 
require an alternative theory, such as the idea of self-organised systems, whose behaviour 
cannot be explained using the slaving principle or other reductions. 

Some attracting critical points of dynamical systems can be characterised using the concept 



of self-organized criticality (SOC), which was first introduced by Bak et al. (122l ; l23l ). Using 
simple automata, they demonstrated power-law relationships and 1/f noise (also known as 
"flicker noise") in spatially extended systems, this behaviour illustrates critical phenomena, 
and underpins more general scale-invariant behaviour and fractals. They studied the dynamics 
of damped pendula and the slope of sandpiles, determining critical points of the systems. 
One aspect of self-organised criticality is the separation of timescales: in the most familiar 
application of sandpiles, grains are continually added on a faster timescale. The gradient 
of the pile slowly steepens and there are avalanches (large-scale reorganisation of the pile) 
which take place rapidly but are separated by large time intervals (relative to the timescale 
at which grains are added to the pile). Bak et al also noted that changing the values of 
system parameters did not affect the emergence of critical behaviour. Avalanches in a one- 



dimensional sandpile are also analysed by Chapman et al. (see, for example, (]24j)) who showed 
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that the distribution of energy discharges due to internal reorganizations have a power-law 
form and so demonstrate that the system is self-organized. 

In general, for a noisy system the power spectrum has the form S(f) = cf~P, where c is a 
constant. The noise present in the system can be classified in three important categories as 
follows: 

• white noise, for {3 = 0; 

• pink noise, for (3 = 1; 

• red noise (also known as Brownian noise), for f3 = 2. 



However, the term "1// noise" is widely used to refer to any noise with a power spectral 
density S(f) oc f' 13 , with < < 2. For 1/f noise that occurs in nature, {3 is usually close 
to 1. 



Although there is no single well-defined class of systems having the SOC property, it is typ- 
ically observed in complex systems with slowly-driven nonequilibrium behaviour, for which 
the causes of an event taking place in a system cannot be explained simply through some 
parameter values. Several studies of SOC show that scale-invariant phenomena can be deter- 
mined at critical points, but not necessarily at any critical point. There are two important 
categories of such phenomena: fractals (1201 ) and power laws (12 ll). Whilst the first category 
involves geometric shapes, which can be split into parts that are reduced-size copies of the 
initial shape, the second deals with frequency-dependent quantities and, hence, is relevant in 
the analysis of some Hamiltonian systems. We note that self-organised systems are always at 
criticality, but not all critical systems are self-organised. 



The r ang e of systems exhibiting critical properties varies from earthquake s (|25t l26l ) and forest- 
fires ( 1271 ; l28l ) to biological systems, such as proteins ( 29t l30l ). the brain (13 ll ) and even DNA. 
Selvam (j32h . for example, studies the distribution of bases in a human DNA sequence and 
shows that the C-G base-pair frequency distribution exhibits a universal inverse power-law 
form. Also, Harris et al. (1331 ) analyse the configurational entropy of a DNA molecule based 
on the entropy estimation for a Gaussian configuration given by Schlitter (1341 ). which helps 
investigate whether a steady state has been reached during a simulation. They show that 
the estimate of the entropy S n depends on the number of data points n and this relation is a 
power law (with exponent between zero and minus one). 



2 Modelling DNA 

In our system, we also have a separation of timescales: there are rapidly oscillating forcing 
terms applied to the DNA chain illustrated in Figure [1] (these model the interactions with 
water molecules and are akin to grains being added to the sandpile) and the occasional larger- 
scale restructuring of the chain as the base-pairs open or close at the start and end of breathing 
events (which are akin to avalanches in sandpiles). We now discuss in detail the modelling 
approaches we adopt. 
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2.1 All- atom MD modelling using AMBER 

As already mentioned, we obtain data from an all atom simulation of DNA using the package 
AMBER (3). The DNA sequence analysed contains 12 base-pairs as follows: 

CTTTTGFATCTT 
GAAAAC ATAGAA 



This sequence is analysed at a constant temperature of T = 293K, in the presence of a 
surrounding water box. The solvent has to be taken into account because it influences the 
displacement of atoms and through other bonds, affects the hydrogen bonds linking the bases 



on the DNA strands (136! ) . Even if the breathing events occur on the nanosecond time-scale and 
the DNA sequence contains only 12 base-pairs, which together with the sugars and phosphate 
groups represent 763 atoms, the number of degrees of freedom in our system is actually 
very large (16682) due to the water box. This means most of the time is spent computing 
information about the solvent, even though this information is not used for our analysis, since 
we focus only on the DNA bases and their dynamics. 

Note that AMBER considers that the normal twist by default is about 32.5°. In order to 
avoid this inconvenience, we have constructed the DNA sequence by considering the degree of 
twist at rest. The degree of twist is preserved by imposing a harmonic restraint on the atoms 
at the end bases. More precisely, we have considered a constant energy (of 1 kcal mol" 1 A~ 2 ) 
and hence a constant force acting on the end bases, in order to keep the DNA atoms close to 
their initial positions. Applying this restraint to the end bases allows the A-F pair to breathe 
and so explore a larger volume of phase space than the other base-pairs. 

The force field used during the AMBER simulations is also important. For a DNA molecule, 
the predefined FF99SB all-atom force field was used. Moreover, AMBER provides several 
water models for residues with name WAT - the default is TIP3P, which was used in our 
simulations. After creating the topology and coordinates files using the AMBER packages 
LEaP and nucgen, we have used SANDER for energy minimization. This process involves 
a structural relaxation, which is necessary because the coordinates file contains some initial 
values that do not guarantee a minimum energy configuration, this reduces the possibility 
of having conflicts or overlapping atoms. In addition to energy minimization, we have also 
performed a few equilibration and MD simulations in which temperature changes. 

Finally, our system is simulated using SANDER. Next, ptraj is used to measure the distance 
between the A-F base-pair as well as the separations of the other base-pairs and the corre- 
sponding velocities. This data is output every 1 ps or 2 fs depending on which simulation 
study is being performed. 



2.2 Stochastic mesoscopic model 

In DNA, the rapid external fluctuation events correspond to random collisions of the bases 
with the water molecules surrounding the biopolymer. These events correspond to pertur- 
bations to the variables y n {t), fluctuations in these displacements may propagate along the 
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Figure 2: The interchain interaction potential E (y ), the total potential energy U(yo) = 
Eo(yo) + \kyli and the potential of mean force PMF(y Q ) = U(yo) + V^b^^oZ/o used in (jlj), 
for a 36° twisted DNA sequence. 



chain, perturbing neighbouring bases, eventually causing the central base yo(t) to cross a 
barrier in the potential E (y ), see Figure EJ 



A second source of data is the reduced model introduced in p) which is based on a system 
of stochastic ordinary differential equations (SDEs). We start from the deterministic system 

H = ^ {\ii 2 n + \v 2 n + \"i{u n - v n ) 2 + \k{u n+ i - u n f + \k(v n+ i - v n ) 2 } 

n 

+ l(k - k) [(«! - U ) 2 + (uq - M-l) 2 + {V X - V ) 2 + (Vo - V-!) 2 ] 

+E (y ) - i 7 (uo - ^o) 2 , (1) 

which is illustrated in Figured] Here, u n (t) represent the deviations from equilibrium of the 
bases on one chain of the DNA, and v n (t) the displacements of the bases on the second chain, 
with the index n determining the position down each chain (—6 < n < 5). Since we are 
interested in investigating in more detail a system similar to that of Guckian et al. (1371 ) we 
place a defect at the centre of the chain, that is, the uq and vq bases correspond to the A-F 
base-pair. The parameter k describes the stiffness of the backbone down each side of the 
double-helix structure, whist the parameter 7 indicates the strength of interaction between a 
base on one strand and its complement. These forces are assumed to be uniform along the 
DNA double helix, except at the defect where they are replaced by the parameter k and the 
nonlinear force E' (y ) respectively. To these we add noise and damping terms with coefficients 
e* and r]* respectively. The dependence of all these parameters on the twist angle 9 has been 
determined in an earlier paper (Q). We make the transformation y n = u n — v n so as to obtain 
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equations of motion for the distances between base-pairs 

= k(y n+ i-2y n +y n „ l ) -yy n -rj— ^ + e£„, (\n\ > 1), (2) 

= k(y -y-i) - k(y-!-y- 2 ) - lV-\ - + e C-i, (3) 

■^p- = «(l/i- 2 2/o+2/-i) - "j— (l/q) - m-fa + e o£o, (4) 

^ = k(y 2 -yi) -k(yi-y ) -jyi _ ^^ + e ^- ( 5 ) 

The functions £ n (£) are white noise forcing terms. The quantities 7, k, k, e, eo, T], t] are 
all fitted using a maximum likelihood estimation procedure, as is the interaction potential 
E (y ), a full description of this is given in (0; 35). A typical example of the multiwelled 
interaction potential, E (y ) is given in Figure [2j Note that the total potential energy of the 
central defective base-pair is U(y ) = E (y ) + ^ky^, and the potential of mean force (free 
energy) is given by PMF(y ) = U(y ) + Vk^Tr] y , can easily be obtained from simulations 
by plotting a histogram of binned displacement data. 



3 Results 

We measure the distance between the two bases (A, F) at the defect over many nanoseconds, 
that is, we sample yo(t) at specific intervals. Using data from both AMBER and the SDE 
system we analyse the (discrete) Fourier transforms of yo(t) for a range of twist angles from 
30° to 40° per base-pair. 

The discrete Fourier transform is taken of the data, in the figures displayed later, we use the 
notation DFT{uj) for yo(u), and the hypothesis we are testing is that over a wide range of u, 

\ogy (u) = -/31ogw + logC. (6) 

The highest frequency attainable is w m£K = 27r/2At, whilst the lowest frequency is given by 
2n/T where T is the length of the simulation and At is the sampling interval. Our initial 
results are simulations of length approximately T = 10 ns sampled every At = 1 ps (giving 
~ 10 4 data points). We then perform simulations of T = 2 ns sampled on a much finer scale 
of At = 2 fs (giving ~ 10 6 data points). These are performed using both AMBER and our 
SDE system. The SDE system is then subjected to a longer-time simulation of T = 100 ns, 
sampled every At = 1 ps (~ 10 5 data points). This enables a wide range of frequencies to be 
sampled: for the initial simulations 0.00063 < 00 < 3.14 ps -1 (—7.4 < logw < 1.14), for the 
more frequently sampled simulations 0.0032 < u < 1600 (—5.7 < logo; < 7.4), and for the 
long simulations 2n x 10~ 5 < to < 3.14 (-9.7 < logw < 1.14). 

Other frequencies which might be of relevance in interpreting the results are the upper and 
lower limits of the phonon band (which we refer to as u opt and Co> ac respectively), and the 
frequency of the defect mode. Assuming that the defect mode can be approximated by 
jjo = — EQ(0)y we find Eq(0) = 14 and Co>def = 3.7, hence logc^def = 1-3. Since 120 < 7 < 165 
and 160 < 7 + 4k < 210, we have u; ac = 11 and u opt = 14.5, implying logo; ac = 2.4 and 
logo; p t = 2.7. These all lie well above the range of frequencies that we shall be interested in 
below. 
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3.1 Initial results 




Figure 3: Discrete Fourier transforms (power spectra) of yo(t) plotted against uona log-log 
scale. Data sampled every 1 ps for 10 ns from a simulation of a 38° twisted DNA sequence 
from (a) AMBER, and (b) the SDE system 

In Figure [31 we present the log-log plot of the discrete Fourier transform (DFT) against the 
frequency u, for 2 13 data points, that is, yo(t) sampled every 1 ps for 8 ns, for a 38° overtwisted 
DNA sequence. This Figure shows a straight line fit over several orders of magnitude of u 
(—7 < log e w < —1), with gradients of 0.7-0.9 which are close to —1. Similar results are 
obtained for a 34° twisted DNA sequence, for example, as can be seen in Figure HI These 
results suggests that there are breathing events of, and separated by, arbitrarily large times. 
Thus, if a DNA strand was successively observed for increasingly long intervals of time, there 
would always be breathing events of duration comparable to the total observation time. The 
gradients of the log e DFT(a;) against log e o; lines for the full range of twist angles tested are 
summarised in Table [1] and suggest the presence of generalised 1/ / noise in our data (that is, 
Ijf with < /3 < 2). 

Initially our aim was to identify the dominant frequencies of breathers at the defect through 
the interchain distances. The asymptotic results of (@) initially appear to suggest that 
breathers are time-periodic modes with well-defined frequencies; however, that theory actually 
predicts a one-parameter family of "in-phase" breather modes with frequencies occupying the 
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Figure 4: Discrete Fourier transforms (power spectra) of yo(t) plotted against uona log-log 
scale. Data sampled every 1 ps for 10 ns from a simulation of a 34° twisted DNA sequence 
from (a) AMBER, and (b) the SDE system 



Angle 


Pamber 


PsDE 


30° 


0.725 


0.750 


32° 


0.700 


0.725 


33° 


0.725 


0.775 


34° 


0.750 


0.825 


35° 


0.750 


0.775 


36° 


0.775 


0.825 


38° 


0.700 


0.875 


40° 


0.700 


0.700 



Table 1: The gradient /3 of the log-log representation of DFT(y ) against u, from ~ 10 ns of 
data, sampled every 1 ps obtained from the AMBER and SDE models. 

full range of values from the bottom of the phonon band down to arbitrarily small frequencies 
(as well as a one-parameter family of "out of phase" breathers with frequencies above the 
top of the phonon band). Since the part of the frequency range that we are interested in 
here is the small-w limit, it is the former, in-phase, family that concerns us here. We assume 
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that a combination of the stochastic forcing noise and nonlinear interactions of phonons with 
the breathers which changes their frequency over time and may even sporadically create and 
destroy the breather modes. 



3.2 More refined results 

By decreasing the sampling interval (At), we expect to obtain more accurate results; the cost 
being the increase in data storage requirements. Analysing data sampled every 2 fs for 2.1 
ns (more precisely, 2 20 or 10 6 data points), we obtain qualitatively similar results, as can be 
seen in the results presented in Figure EJ which shows the Fourier power spectra for a DNA 
helix twisted to 34°. 

(a) 
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log(co) 
(b) 
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Figure 5: Log-log plots of the discrete Fourier transforms (power spectrum) from more refined 
samplings of yo(t), specifically, data was sampled every 2 fs for 2.1 ns. Results for a 34° 
undertwisted DNA sequence from (a) AMBER and (b) the SDE system (J2J) — (J5j) . 

As with the results shown in Section IBTTj the full range of twist angles from 30° to 40° have been 
simulated and analysed the results are summarised in Table [2j These results, with the reduced 
sampling interval, suggest that the average value of /3 for such AMBER simulations increases 
to 0.93, which is much closer to unity than the values obtained when the sampling interval 
is At = lps (Tabled]). Analysing a similar data set from the SDE model with more frequent 
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sampling shows a similar increase in (3, from an average of 0.79 when At = lps (Table [T]) 
to At = 0.91 (Table |2]). These results again confirm that our reduced mesoscopic model 
accurately reproduces the self-organised DNA behaviour observed in the all-atom AMBER 
simulation. This observed increase in (3 suggests that we might reasonably expect (3 — » 1 as 
At ->■ in both the AMBER and the SDE simulations. 



Angle 


Pamber 


PsDE 


30° 


0.920 


0.900 


32° 


0.920 


0.895 


33° 


0.900 


0.910 


34° 


0.940 


0.920 


35° 


0.930 


0.900 


36° 


0.930 


0.910 


38° 


0.940 


0.940 


40° 


0.950 


0.905 



Table 2: The (3 exponents derived from more refined sampling of yo(t), specifically every 2 fs 
for 2.1 ns. The /3-exponent is the gradient of the log-log representation of DFT(y ) against 
u), from AMBER and SDE simulations (compare with Tabled]). 

At higher values of u, both AMBER and SDE simulations show changes in behaviour. Firstly, 
in both cases, the line broadens as more points are plotted at higher values of logo; and 
these display a greater variation; see the ranges — 1 < \ogu < 2 in Figures [3] and [4] and 
— 1 < logo; < 6 in Figure [51 Note that the same ranges apply to both AMBER simulations 
and SDE simulations. Secondly, at larger u, there is a more significant reduction in the 
discrete Fourier transform, for the AMBER simulation this occurs from u = 2 upwards, 
following by a more abrupt decrease around u = 6.5 in Figure E^a); whereas, in the SDE 
simulation, Figure [5(b), the spectrum has a simpler form with a more rapid linear decrease 
with a larger gradient beyond u = 2.5. 

3.3 Long-time results 

We have analysed a longer simulation of 100 ns, sampling yo(t) every At = 1 ps using just 
the SDE system; such a long simulation is beyond the scope of AMBER on currently avail- 
able computing facilities. This length of simulation allows lower breathing frequencies to be 
sampled, as shown in Figures [6] and [71 Here we observe the same self-organised behaviour as 
in earlier graphs, although with some deviation from the straight line at particularly small 
frequencies, namely those in the range —9 < logu < —7. 

For example, from the 2 16 (~ 10 5 ) data points in the simulation of a 30° undertwisted DNA 
sequence illustrated in Figure [6j we find (3 = 0.725. This value is similar to that found in the 
shorter simulation of 10 ns sampled every 1 ps, where we found (3 = 0.750; the value of 0.725 
is identical to the value obtained from the shorter AMBER simulation; both these values are 
reported in Table [U 

For the 38° overtwisted DNA molecule the long-time SDE simulation gives (3 = 0.775, which 
lies between the value of (3 = 0.875 from the shorter SDE simulation and (3 = 0.700 from the 
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shorter AMBER simulation. Thus for both twist angles, the long-time SDE simulation gives 
exponents closer to the AMBER results than the shorter SDE simulations. 
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log(co) 

Figure 6: Log- log plot of the discrete Fourier transform (power spectrum) of yo(t) from a 
long-time (100 ns) SDE simulation for a 30° twisted DNA sequence, Uo(t) data sampled every 
1 ps. 



3.4 Bases away from the defect 

Finally we have analysed the motion of the base-pairs adjacent and further away from the 
defect in the AMBER and SDE systems, in both cases using the example of a 38° overtwisted 
DNA helix. For example, Figure [8] illustrates the power spectrum of the second-neighbour 
base-pair y 2 t, sampled every 1 ps over a simulation of length 10ns. Although the decay with 
increasing frequency (co) is not as clear as in Figures [31 SJ El or it is still possible to fit 
a straight line through the points. For the AMBER and SDE simulations respectively, the 
gradients of these lines are 0.225 and 0.210 respectively. 

This procedure has been repeated for the first neighbour base-pair, yi(t), and more distant 
base-pairs, y^ (t), yi(t), and the corresponding gradients of the best fit lines of the log-log 
plots have been calculated. Assuming that the FTs have power-law forms, these gradients, 
which correspond to the exponents, /3, are given in Table [31 From the data in this Table we 
observe that /3 decreases as one moves away from the defect site. This behaviour is due to the 
reduced influence of the breathing pair on the neighbouring base-pairs. We observe a drop 
from /3 « 1 at the defect (n = 0) to just under one quarter at the nearest neighbour (n = 1) 
in both SDE and AMBER. 

The analysis of yo(t) showed that the exponent (3 increased from 0.7 — 0.8 to 0.90 — 0.95 
when the sampling frequency was decreased from lps to 2 fs, suggesting convergence to = 1 
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log(co) 

Figure 7: Log- log plot of the discrete Fourier transform (power spectrum) of yo(t) from a 
long-time (100 ns) SDE simulation for a 38° twisted DNA sequence, yo(t) data sampled every 
1 ps. 



Base-pair 


Pamber 


PsDE 


yi(t) 


0.225 


0.210 


V2{t) 


0.180 


0.135 




0.180 


0.130 




0.180 


0.130 



Table 3: The gradient /3 of the log-DFT function, from AMBER and SDE data sampled every 
1 ps for 10 ns. 



Base-pair 


Pamber 


PsDE 


yi(t) 


0.450 


0.445 


V2(t) 


0.425 


0.205 


v*(t) 


0.410 


0.180 


V4(t) 


0.390 


0.155 



Table 4: The gradient ft of the log-DFT function, from AMBER and SDE data sampled every 
2 fs for 2 ns. 

in the limit of small sampling frequency. Hence, we attempt to find more accurate values 
for the /9-exponent for the neighbouring bases by decreasing the sampling timestep from 1 
ps to 2 fs (as we reduce the simulation length from 10ns to 2ns). We obtain the gradients 
given in Table HI Summarising, we find values just under one half at the nearest neighbour 
(n — 1) in both SDE and AMBER. For y 2 , in AMBER, there is then a further slow decay of 
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Figure 8: Discrete Fourier transforms (power spectra) of y 2 (t) plotted against uona log-log 
scale. Data sampled every 1 ps for 10 ns from a simulation of a 38° twisted DNA sequence 
from (a) AMBER, and (b) the SDE system j2}-©. 

(9(0.01) per base-pair; whereas in SDE there is a more significant drop to 0.2 for y 2 and then 
a slow decrease of 0(0.01) per base-pair. We observe once again that these later results with 
a decreased sampling timestep of At = 2 fs increases the measured values of (3. Hence, we 
recommend that the even with spatially coarse-grained models of DNA, kinetic characteristics 
should be analysed using as small a timestep as possible, even as small as 2 fs (as used in 
AMBER) in order to obtain correct results and conclusions. 

This extra decrease in the SDE system may be due to the reduced number of degrees of 
freedom (only one per base-pair in the SDE), whereas in AMBER there are O(10 2 ) degrees 
of freedom per base-pair (the bases having 15 atoms moving in 3D space, in addition to the 
phosphate backbone). Whilst all base-pairs receive energy in the form of white noise forcing 
(in the SDE system) and in the form of random collisions with water molecules (AMBER), 
this energy is used and dissipated differently in the defective base from its neighbours. At 
the defect, there is a change in temporal behaviour, since white input noise (£o) is converted 
to pink output (y ) giving (3 ~ 1, whereas in the neighbouring bases, the output noise (y n ) 
remains significantly closer to white, that is /3 is significantly smaller. 
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4 Conclusions 



In summary, we have simulated a sequence of 12 base-pairs of DNA using two different models 
for a variety of times up to 100ns. We have analysed the displacement between the defective 
base-pair near the centre of the chain. Fourier transforms of the distance trajectory taken 
from both the AMBER and the mesoscopic stochastic differential equation simulations exhibit 
scale-free, or critical, behaviour for all twist angles in the range 30°-40° per base-pair. From 
(jSJ), we have yo(uj) = Cuj~® across a considerable range of frequencies, u. Furthermore, we 
find that (3 = 1 for all twist angles. Although we have imposed white noise forcing in the 
system the noise observed in the output yo(t) is pink (/3 ph 1, Figure [3]). This shows 

that our SDE system preserves the SOC properties of DNA observed in the fully deterministic 
all-atom AMBER simulations. 

It is suspected that proteins which interact with DNA overtwisting or undertwisting the 
structure in order to ease the release of bases out from the structure. The fact that = 1 
for all twist angles appears to suggest that such a strategy will not change the base-pair 
breathing. However, the constant C in the formula yo{oo) = Cu' 13 will depend on twist angle, 
and mean that the fraction of time spent in the breathing state is less for the more stable 
angles 35° — 36° and more for the overtwisted or undertwisted DNA structures (i.e those in 
the ranges 38° — 40° and 30° — 34° respectively). 

What is surprising about our results is that the critical behaviour is not specific to any one 
twist angle but occurs at all angles. One might expect that, at normal twist angles of 35°-36°, 
stable behaviour would be observed, with breathing events having some short characteristic 
timescale; at smaller and larger twist angles, a critical point would be found, where the DNA 
exhibited scale-invariant breathing, and that at even more extreme twist angles, the open state 
would be stable. However, this is not the case at all, instead, we find 1/f behaviour at all 
twist angles. Since the emergence of this critical behaviour is not affected by the variation of 
the twist angle of the system parameters values, or by the careful tuning of other parameters, 
we describe this as self-organised criticality. 

The scale-free nature of the kinetics of breathing events at all twist angles described herein is 
strongly reminiscent of the behaviour of fluctuations in systems at criticality. Thus, it appears 
that without any tuning of the interaction parameters of the DNA strand, it is at a critical 
point where open bubbles spontaneously nucleate, hence we apply the term 'self-organized 
criticality'. Figures [3] and [7] suggest that there is an upper frequency (around log e u; = — 1), 
above which the amplitude of base-pair separation modes is small but ceases to decay any 
further, due to the effect of noise in the system. This cutoff is not due to the start of the 
phonon band (which occupies the range «/7 < w < \/j + 4&), and which corresponds to a 
relatively narrow range of velocities around 11 < u < 14 (precise values depend on the twist 
angle). 

We observe some artifacts of the phonon band in the region of logo; being between two and 
three and the defect mode near logu = 1, in that there is a shoulder in the power spectrum 
in Figure [5] where the trace is slightly larger than expected; however, no behaviour should 
be expected to persist over all scales, and the figures show good agreement with scale-free 
kinetic behaviour (straight-line) over the considerably large range of —6 < logu < 1. 

One might think that the defect site is the cause of the SOC breathing behaviour. However, 
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replacing the thymine (T) base with a difluorotoluene (F) base lowers the barrier between the 
closed and open states. The replacement does not affect the DNA structure or other behaviour, 
as discussed in several papers, for example, (37). Lowering the energy barrier allows breathing 
to occur at lower energies, and so occur with a higher frequency, on a timescale accessible 
to MD simulations (on the nanosecond timescale as opposed to the microsecond scale for a 
normal DNA sequence). There is no reason to suppose that a change in the frequency of 
events should cause a more significant change in qualitative behaviour. Hence, we speculate 
that in pure DNA, with no defect, but with multiwelled potentials between all corresponding 
base-pairs, curves such as that seen in Figure EJ will be repeated but that the crossover 
frequency (from w-independent noise to breathing with amplitude proportional to l/oo) will 
be shifted to much lower frequencies, namely the microsecond scale, which is beyond current 
MD simulations. Here we only have a double- well potential at the defect, the other inter-base 
interactions are all governed by harmonic potentials, in reality, all inter-base interactions are 
double-welled, this will allow an open base-pair to be the nucleus for a bubble of several 
consecutive open base-pairs to form, as the along-chain interactions would then ease the 
opening of neighbouring base-pairs. 
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